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THE MULTIVARIATE COVERING LEMMA 
AND ITS CONVERSE 
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Abstract. The multivariate covering lemma states that given a collection of 
k codebooks, each of sufficiently large cardinality and independently generated 
according to one of the marginals of a joint distribution, one can with prob¬ 
ability arbitrarily close to one choose one codeword from each codebook such 
that the resulting fc-tuple of codewords is jointly typical with respect to the 
joint distribution. Prior proofs of the multivariate covering lemma primarily 
employ strong typicality. We give a proof of this lemma for weakly typical sets. 
This allows achievability proofs that rely on the covering lemma to go through 
for continuous (e.g., Gaussian) channels without the need for quantization. 
The covering lemma and its converse are widely used in information theory, 
including in rate-distortion theory and in achievability results for multi-user 
channels. 


1. Introduction 

The covering lemma and its extensions play a crucial role in achievability results 
in network information theory. Covering lemmas are useful for enabling network 
nodes to transmit codewords that “look like” they are generated from a dependent 
distribution, whereas in reality, they are carefully selected from sufficiently large 
codebooks that are independently generated. This allows nodes to obtain the ben¬ 
efits of both independent and dependent codewords: like independent codewords, 
such codewords can be decoded in different locations; like dependent codewords 
they have the potential to achieve rates higher than those achieved by independent 
codewords. This benefit, however, comes at a cost in rate. Thus the strategy is 
useful when the benefit transmitting dependent codewords exceeds its cost. 

In the context of the covering lemma, the concept of “looking like” dependent 
codewords is captured by the notion of being jointly typical with respect to a 
dependent distribution. As there are various ways to define the typical set (here 
we specifically focus on weakly typical [2] and strongly typical sets 0) , one may 
ask whether a specific version of the covering lemma holds for a given definition of 
the typical set. The weakly typical set has two advantages over the strongly typical 
set. First, it is easily defined for continuous (e.g., Gaussian) distributions. Second, 
the weakly typical set has a simple one-shot counterpart, which allows proofs using 
the weakly typical set to be written in the one-shot framework in a simple manner. 
On the other hand, some results hold for the strongly typical set that do not hold 
for the weakly typical set. Thus it is helpful to review the covering lemma and its 
extensions and see for which definition of the typical set each result is currently 
known to hold. 

The simplest case of the covering lemma is the situation where given a random 
vector and an independently generated codebook, a node looks for a codeword in 
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the codebook that is jointly typical (with respect to a dependent distribution) with 
the given random vector. The result obtained in this case, simply referred to as the 
“covering lemma”, appears in the achievability proof of the rate distortion theorem 
using weakly typical sets [ 2 !- The second case, called the “mutual covering lemma,” 
treats the case where given two independently generated codebooks, a node looks 
for a jointly typical pair of codewords, where each codeword is from one of the 
codebooks. This result is used in Marton’s inner bound for the two-user broadcast 
channel and is proved for strongly typical sets BIZI- Recently, by extending the 
proof of [2], the authors of 00 prove a one-shot version of the mutual covering 
lemma. This proof can be used to show the validity of the mutual covering lemma 
for weakly typical sets in the asymptotic setting. The proof in 00, however, 
requires stronger independence assumptions on the codebooks than the proof using 
strongly typical sets in 00- Finally, the “multivariate covering lemma” is the 
extension of the mutual covering lemma to k independently generated codebooks, 
and can be used to obtain an inner bound on the broadcast channel with k users 
0. As stated in [3], one can show this result holds for strongly typical sets by 
extending the proof of the mutual covering lemma [4]. 

In this work, using the general strategy of El Gamal and Van der Meulen 0 
and some ideas regarding weakly typical sets from Koetter, Effros, and Medard 
0, we give a proof of the multivariate covering lemma for weakly typical sets. 
We also provide a converse, a special case of which is usually referred to as the 
packing lemma 0. We remark that while similar to the argument in 0, we use 
Chebyshev’s inequality for the direct result (Section 0, it is also possible to use 
the Cauchy-Schwarz inequality (see Appendix [A]), which leads to a more accurate 
upper bound. 


2. Problem Statement 

For every positive integer n, define the set [n\ = {1,..., n}. Let k be a positive 
integer and 

p(u 0 ,ui,.. .,u k ,u k + l) 

be a probability distribution on the set 

/c+l 

3=0 

For every nonempty S C [k] define 

Us = \{U, 

jes 

For every j G [k], let Mj be a nonnegative integer. For every nonempty S C [k ], 
define the set A4s as 

m s = n^i- 

j&s 

and let A4 = M.\k\ ■ For every m = (mi,..., rrik) G A4 , let the random vector 

(C/ 0 , Vi(mi),..., Vfe(TOfc), Uk+i) 

fc+i 

V(u o) IJpK'K), 

3 = 1 


have distribution 
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where p(u o) and each p(uj\uo ) are the conditional marginals of p(uo ,..., u k + 1 ). In 
addition, let T be an arbitrary subset of Uq x U\j~+i\- We want to find upper and 
lower bounds on the probability 

PjVmeM: {U 0 ,U x (mx),...Mm k ),U k +i) £ 

We derive the lower bound (Section [3]) using the union bound, which does not 
depend on the statistical dependencies of the vectors 

(Uo, Ui(mi ),..., U k (m k ), U k +i) 

for different values of m. For the upper bound (Section 0]), which leads to the 
multivariate covering lemma, we require a stronger assumption, which we next 
describe. 

Let m = {nij)j e [k] and m' = ( m j)je[k] be in M.. Define the set as 

*Sm,m' — {j G [k] • IRj’ — IRy } ■ 

When m and m' are clear from context, we denote with S. In the proof of 

the upper bound we require 

P {Vj G [k] : Uj(rrij) = uj and /T^m') = u'-|t/ 0 = it 0 , U k +i = u k + 1 | 

k 

= x n p( u j i u °)’ 

j=i 

for all uq and all ( uj)j and (u'j)j such that if j G S, then Uj = it' (Assumption I). 

Note that if there exists a j G S where Uj ^ it' then the probability on the left 
hand side equals zero. 

In the corresponding asymptotic problem (Section [5]), we apply our bounds to 
P {Vm : (US, US (mi ),..., U]f(m k ), UJf +1 ) t A^}, 
where for every m, 

(US,US(m i),-,U%(m k ),U% +1 ) 

is simply n i.i.d. copies of the original random vector 

(Uo, Ui(mi ),..., U k (m k ), U k+ 1 ), 

(Assumption II) and A^ is the weakly typical set for the distribution p(uo, iti,..., u k , u k + 1 ). 
Our main result follows. 

Theorem 1 (Multivariate Covering Lemma). Suppose Assumptions (I) and (II) 
hold for the joint distribution of 

Uq , {Ui(mi),..., U k (m k )} m , U k+1 . 

For the direct part, suppose for all j G [fc], Mj > e nRj . If for all nonempty S C [/c], 

E^>E H(Uj\U 0 ) - H(Us\U 0 , U k +i) + (8k - 2|S| + 10)<5, (1) 

j&s jes 


then 


lim pjdm: (U% ,U? (mi),... ,U% (m k ) ,U% +1 ) gA' b) } = 1. 

i —»oo L J 


(2) 
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For the converse, assume for all j £ [k], Mj < e nRj . If Equation holds, then 

H ( U i\ u o) - H(Us\U 0 , U k+ 1 ) - 2(|S| + 1)5, 

jes j&s 

for all nonempty S C [A:]. 


In the direct part of Theorem [l] we can weaken the lower bound on Y2jes Rj 
when S = [/c]. Specifically, we can replace Equation <[T|) with 
k k 


>E H(Uj\U 0 ) - H(U [k] \U 0 , U k+ 1) + 2 {k + 1)5. 

o =i i=i 


for S = [fc]. 


3. The Lower Bound 


For every S C [k], define Fs as the projection of F on Uo x Us x U k + 1 - Then 
for every (uo, us, u k +i) £ Fs, let F(uq, us, Wfc+i) be the set of all ugc such that 
(ito,U[fe],Ufe+i) £ J 7 . In addition, for every nonempty S C [fc], let ag and /?g be 
constants such that 

p{u s \u 0 ,u k +i) 

as < log y; - . | 

U/esJPwK) 

for all (uo,us,u k +i) £ Fs and 


/3 s < log 


p{us\uo,us°,Uk+i) 

Uj & sP( u j\ u o) 


for all (uo,us,usc,u k +i) £ F. Furthermore, let the constant 7 satisfy 

p( u { k] \u 0 ,u k+ i) 


7 > log 


Yl je [k]P( u j\ u o) 


for all (u 0 ,u[k],u k + 1 ) £ F. 

For every m = (mi,..., m k ) £ M, define the random variable Z m as 


Z m — 1 


| (U 0 , U k (m k ), U k +i) £ 


and set 

Z= 2 m- 

mSA4 

Our aim is to find a lower bound for P{Z = 0}. Note that for every nonempty 
SC \k], 

P {3m : Z m = 1} = P 13m : (t/ 0 , C/i(t7t-i), ..., U k {m k ), U k+1 ) £ J 7 } 

< P 13m : (U 0 , (Ujimj)).^, U k+1 ) £ Jgj 

<I^5|E p(u 0 ,u k+1 ) JJp(«j|u 0 ) 

?s j&s 

< \Ms\e~ as ^p(u 0 ,M S ,u fe+1 ) 

■Fs 

< \Ms\e~ as . 
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Thus 


P {Z = 0} = 1 - P {3m :Z m = lj 
> 1 - min \M s \e~ as . 

\s\^r 


( 3 ) 


4. The Upper Bound 

In deriving our upper bound on P{Z = 0}, we apply conditioning and Cheby- 
shev’s inequality. Thus, the factor 

1 

(P{J-(u 0 ,u fe+1 )}) 2 

appears, where 

P{J r (w 0 ,Mfe+i)} = P {U[ k \ G T(u 0l u k +i)\U 0 = u 0 ,U k +i = u k + 1 } 

= E P{u[k]\uo,u k +i) 

V‘[k]£J r (u 0 ,u k+1 ) 

and J-(uq , rtfc+i) (Section[3j) is simply the set of all U[ k j’s that satisfy (uq, U[ k ] , u k +i) G 
J-. Thus to get a reasonably accurate upper bound, we require P{J r (uo, Rfc+i)} to 
be large. However, as we cannot guarantee this for all (uo,u k +i), we partition the 
(uo,Uk+i) pairs into “good” and “bad” sets, corresponding to large and small val¬ 
ues of P{J r (rto, rtfc+i)}, respectively. The probability of the good set is large when 
P{(Uo,U[k],Uk+i) G J 7 } is sufficiently large. To see this, fix e > 0 and following 
Appendix III of [5], define the set Q C Uq x Uk+i as 

G = {(wo,Mfc+i) : P{-A(wo, U k +i)} > 1 - e}, 

Note that Q is the set of all good (uq, u k + i) pairs as defined above. We have 

P {(U 0 ,U [k] ,Uk+i) &T} = E E p(u 0 , u k +l)p{u[k] |u 0 , Uk+l) 

“o,“fc+l U[fc] &J r (u 0 ,u lc+1 ) 

= E p{uo, Uk+l) Pl^iuo, Uk+l)} 

UO,Uk+l 

< (1 - e) P{(J7 0) U k+ 1 ) i G) + P{(U 0 , Uk+i) G G} 

= l-eP{(U 0 ,U k+1 )^G}. 

Thus 

P{(U 0 ,U k+1 ) i £} <- e P{(U 0 ,U [k] ,U k+ i) i T). (4) 

Our aim is to find an upper bound for P{Z = 0}. To do this, we write 
P{Z = 0}= ^2 p(uo,u k +i)P{Z = 0\u 0 ,u k +i} 

u 0 ,u k+1 

<±P{(U 0 ,U [k] ,U k+ i)?T} + E p(uo, Uk+l) P{Z = 0|u o , Uk+ 1 }, 

( u o ,u k +l)&0 

( 5 ) 

where the inequality follows from Equation ([4]). Therefore, to find an upper bound 
on P{Z = 0}, it suffices to find an upper bound on P{Z = 0|Z7o = ito,, U k +i = 
Mfc+i} for all (ito,rifc+i) G Q. Fix (uo,u k +i) G G . We use Chebyshev’s inequality 
to find an upper bound on P{Z = 0|C/q = uo,U k +i = ttfc+i}. Thus we need to 
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calculate E[Z\Uo = u o, Uk +1 = Uk+ 1 ] and E[.Z 2 |f7o = uq, Uk+i = Uk+i]- For a given 
m, from the definition of 7 (Section [3| it follows 

E[Z m \u 0 ,Uk+i) = P ... ,Uk{m k )) £ T(u 0 ,Uk+i)\uo,u k +i} 

= ^2 p{ui\u 0 ).. .p(u k \u 0 ) 

Huo,u k+1 ) 

> e_ 7 P( u [fc]K,«k+i) 

T(uo,uii+i) 

= e -7 P{J r (u 0 , Mfc+i)} > (1 - e)e -7 . 

where the last inequality follows from the fact that (uo,Ufc+i) £ Q. Thus, by 
linearity of expectation, 

E[Z\U 0 = uo, U k+ 1 = « fc +i] > |Af |e _7 (l - e). ( 6 ) 

Next, we find an upper bound on E[Z 2 \Uo = uq, U k +1 = Wfc+i]- We have 

z 2 = + y~~^ = -^ + -Zm-Zm', 

m m^m' m^m' 

since Z{ n = Z m and Z = ^2 m Z m . Thus 

E[Z 2 \u 0 , u k + 1 ] = E[Z|u 0 , Ufc+i] + E [E E m E m / l^o, lifc-i-ij 

m/m' 


For any pair of distinct m and m' with nonempty S = S m , m ', we have 


E \Z m Z m' |uo, Uk+l] 

= e n^>°)( yi n ^(^ji^o)) 

•7 r s(“o,“fc+i) i&S ii S c6.F(uo,iis,Mfc+i) 

< e -a s-2/3 s <= E p(us|uo,Ufc+l)( p(us<=|«0,US,Ufc+l)) 

■^sfuOiUfc+l) UscE-Td^O.MS.Mfc + l) 

< e -«s-2/3 s C ) 

where Ts(uq, u k + 1 ) is the set of all us that satisfy (uo, us, itfe+i) £ J-s- On the other 
hand, if S' = S m m ' is empty, then Z m and Z' m are independent given {Uo, Uk+ 1 ) = 
(u 0 ,u fc+ 1 ), and 

E [Z m Z m ' |uo, Ufc+i] = (E[Z m \uo, u k+i}) • 

Thus (assume \M$\ = 1) 

E[Z 2 |u 0 ,Ufc + i] = E[Z|u 0 ,u fc+ i] + ^2 I-Msl P (\Mj\ 2 - \Mj\)E[Z m Z m/ \u 0 ,Uk+i] 

sc[k] jes c 

<E[Z|u 0 ,u fc+1 ] +(E[Z|u 0 ,u fc+1 ]) 2 + ^ \M s \\MsA 2 e- as ~ 2f>sc , 

0c5c[fc] 


( 7 ) 
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where the notation 0 C S C [k] means that S' is a nonempty proper subset of [&]. 
We have 


P {Z = 0|M O ,Ufe+i} < p jl z - E[Z\u 0 ,u k +\\\ > E[Z\up,u k+1 ] u 0 ,u k+ 1| 


W Var(Z|n 0 , u k +i) _ E[Z 2 \up,u k+1 \ _ 
(E[Z\u 0 ,u k+1 ]) 2 ( E[Z\u 0 ,u k+ x]) 2 


(b) 

< 


1 — e 


1 


I 

0CSc[fc] 


where (a) follows from Chebyshev’s inequality and (b) follows from Equations © 
and 0. Now using Equation ©, we get 

P{^=0}<-P{J c } + -^|M|- 1 eW , 1 1 V \Ms\- l e- as ~ 2f<sc+2 \ 

6 1 ~ e (1 “ e) 0 c^W 

( 8 ) 


5. The Asymptotic Result 

In this section, using our lower and upper bounds, we prove Theorem [I] We first 
prove the direct part using our upper bound from Section |4j Set T = A < Z > and for 
every j G [k], choose an integer Mj > e nRj . Choose a sequence {e n } n such that 

lim = 0 . 

n—t oo e n 1 J 

This is simple to do, since P {(A^ ) c } decays exponentially in n (see Appendix iBl). 
Fix a nonempty S C [k]. Notice that if (Up, (I/J l )j g s, I/^ +1 ) G Fs , then 

/ ^ \ Tl Tt \ 

l0g n ~ U ( E (W) - ^ (W, £4+i)) | < 2n(\S\ + 1)6. 

lij£sPy u j\ u o) v ieS ' 

Thus we may choose 

as = n (j2 H (Uj\Uo) - H(Us\U 0 , U k+ 1 ) - 2(|S| + 1)<5) 

jes 

and 

k 

7 = n(y R(R,|C/ 0 ) - R(t/ [fe] |C/o,^ + i) + 2(fc + 1)$). 

7=1 

Similarly, for every nonempty S' C [k], we choose /3 s as 

/3 S = n(y JT^IC/q) - H(Us\U 0 ,U S o,U k + 1 ) -2(|S| + 1)5)), 

jes 

since for every (Up, (U?) jeS , (U?) jeS c) G F , 

10g P (^l M °’fn | M y) -»(Em l ^o ) -g(Rs l Ro,Rsc,R fc+l) )| < 2n (|S| + l)A 


jes 
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From our upper bound, Equation (|8|) . it now follows that if for all nonempty S C [k ], 

V' Rj > ~ ( 2 7 — as — 2 j3s<=) 

“ n 

jes 

k 

= 2j2H(U j \U 0 ) - 2H(U [k] \U 0 , U k+1 ) - ^ J?(tf,-|tf 0 ) + H(J7 S |J7 0 , £4+0 
j=i jes 

-2 ^ i?(17j|?7o) + 2iJ(I7sc|?7o, E4>£4+i) + (8k — 2|5| + 10)5 

ieS' 

= J2 H (Uj\Uo) - H(Us\U 0 , U k+ 1 ) + (8k - 2|S| + 10)5, 
jes 

and for S = [k], 

k i k 

> -T = 51 l^o) - |C/o, C/fe+i) - 2(fc + 1 )«5, 

.7=1 j=i 

then 

lim Pf3m: (Ctf, l/^K),..., t/ fe "(m fc ), C4"+i) £ 4" } } = 1- (9) 

Next we prove the converse. Suppose for each j £ [k] , Mj < e nRj and Equation 
© holds. Then from our lower bound, Equation J3j), it follows 

^ >l as = J2 H (Uj\U 0 ) - H(Us\U 0 , U k+ 1 ) - 2(|5| + 1)5, 
jes n jes 
for all nonempty S C [k\. 


Appendix A. Cauchy-Schwarz Inequality 

Let Z be any random variable that is nonnegative with probability one and has 
positive first and second moments. Then 


almost surely. Thus 


Z = Z\{Z > 0} 
E [Z] =E[Z1{Z > 0}] 


< s/E[Z 2 } x P {Z > 0}, 

where the inequality follows from Cauchy-Schwarz. Hence 

(E [Z ]) 2 


P{z > 0 } > 


and 


P {Z = 0} < 1 - 


E [Z 2 ] 

my 


E[Z 2 ] ‘ 

On the other hand, using Chebyshev’s inequality we get 
P{z = 0} = P {\Z -E[Z]\ >E [Z]} 
Var (Z) _ E[Z 21 


< 


(E[Z}) 2 (E [Z]) 2 


- 1 . 
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Now note that the bound resulting from Cauchy-Schwarz is stronger, since for any 
t > 0, 


Appendix B. Large Deviations 


The moment generating function of a random variable X is defined as 

M(t) = E[e tx ] 

for all real t for which the expectation on the right hand side is finite. If M is defined 
on a neighborhood of 0, say (—to, to) for some to > 0, then it has a Taylor series 
expansion with a positive radius of convergence [I] pp. 278-280]. In particular, 

!«(<>|„„ = Effl- 

We want to find an upper bound for P{X > a} for some real number a. Choose 
t > 0. Using Markov’s inequality, we get 

P{X >a} = P {tX > to} 

= P{e tA " > e ta j 

< e- ta E[e tx ] 

_ glog M(t)-ta 

Since t > 0 was arbitrary, we get 

PIA > a} < e inf t>o(i°gM(t)-ta)_ 

Define the function / as 

f(t) = log M(t) -ta. 

Then /(0) = 0 and /'(0) = E[A'] — a. Thus if a > E[A], 

inf (log M ( t ) — ta) < 0 . ( 10 ) 

If we apply the same inequality to the random variable 



where the X^s are i.i.d. copies of A', we get 

n 

p{^Ai>na} < e ™^h>o(logM(t)-ta)_ (U) 

i=1 

Now consider a random vector (U\, ..., Uk) with distribution p(u\ ,..., Uk)- For 
every nonempty SC [fc], let Us denote the random vector (Uj)j^s- Let ([/",..., U£) 
be n i.i.d. copies of (U \,..., Uk)- By applying inequality (HID to the random vari¬ 
ables {log }(Li and setting a = H(Us) + e for some e > 0 , we get 

p | it !°g > n(H(Us) + e) | < e ~ nI ^, (12) 

where Is(e ) is given by 

l s (e) = inf {t(H(U s ) + e) - logE^Us)- 4 ] } 
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By the union bound we get 

P{{U?,...,Uj!)tAi n \U 1 ,...,U k )}<2 J2 e ~ nIs[e) 

0CS C[fc] 

< 2{2 k - l) e -« min sls(e) 

< e" n/(e) , 

where 

/(e) = min I s {e) + o(-). 

SC[k] y n' 

Finally, note that by Equation (fTUl) . each /s(e) is positive, thus so is /(e). 
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